This version of the notebook has been edited to exclude the source code. You may contact the researchers if you wish to see the source code.
The objective of this study is to cluster heroes based on their game impact and identify alternative heroes most similar to a banned hero in the event that an opposing team disrupts a team's key strategy by banning their key hero.
For more info about the DPC, you can visit https://www.dota2.com/procircuit.
*NOTE: `premium` matches are DPC games which are in essence, professional DotA 2 matches.
# Connecting to the database
conn = sqlite3.connect('dmwfinalproject.db')
c = conn.cursor()
# Select the dataframe from the database
df_ds = pd.read_sql('''SELECT * FROM dota2''', conn)
# Set hero_id as index for visualization
df_ds.set_index('hero_id').head()
print("The dataframe's shape is:", df_ds.shape)
| Features | Description |
|---|---|
| Categorical | |
| hero_id | The ID value of the hero played |
| match_id | The ID number of the match assigned by Valve |
| player_slot | Which slot the player is in. 0-127 are Radiant, 128-255 are Dire |
| Numerical | |
| ancient_kills | Total number of Ancient creeps killed by the player |
| assists | Number of assists the player had |
| camps_stacked | Number of camps stacked |
| courier_kills | Total number of courier kills the player had |
| creeps_stacked | Number of creeps stacked |
| deaths | Number of deaths |
| denies | Number of denies |
| gold_per_min | Gold Per Minute obtained by this player |
| gold_spent | How much gold the player spent |
| hero_damage | Hero Damage Dealt |
| hero_healing | Hero Healing Done |
| kda | Kill-Death-Assist ratio |
| kills | Number of kills |
| last_hits | Number of last hits |
| level | Level at the end of the game |
| neutral_kills | Total number of neutral creeps killed |
| obs_placed | Total number of observer wards placed |
| observer_kills | Total number of observer wards killed by the player |
| observer_uses | Number of observer wards used |
| roshan_kills | Total number of roshan kills (last hit on roshan) the player had |
| rune_pickups | Number of runes picked up |
| sen_placed | How many sentries were placed by the player |
| sentry_kills | Total number of sentry wards killed by the player |
| sentry_uses | Number of sentry wards used |
| stuns | Total stun duration of all stuns by the player |
| tower_damage | Total tower damage done by the player |
| tower_kills | Total number of tower kills the player had |
| xp_per_min | Experience Per Minute obtained by the player |
As mentioned in Section 3.2, the dataframe is composed of 15,260 rows with 31 columns which was scraped using the OpenDota API and the requests library. The API response was then parsed using the json library.
The actual code for scraping is hidden for privacy purposes, contact the researchers to see scraper.
For replication of the study and easier reanalysis, the data was stored in a database named dmwfinalproject.
The features selected are the following:
After feature selection and data cleaning , here is how the working data looks like.
df_dropped.head()
Shown below are the descriptive statistics of each feature.
df_descr_stat
Before proceeding with the clustering proper, it is important to take a glance at what the data could provide us so that we would know something to expect in the clustering. Below is a horizontal bar plot of the top 10 frequently picked heroes for the recent DPC.
print(hero_count[:10])
# Plotting distributions of each feature
# Pairplot of each feature
df_dropped_normed.head()
# Plotting distributions of each feature
# Plotting normalized distributions of each feature
As seen from the two distribution plots above, the distribution (shape) remained but the mean and standard deviations were set to zero (0) and one (1) respectively.
To determine the optimal number of clusters that best fits the data, internal validation criteria were used. Below are the internal validations used to evaluate the clusters formed using K-Means.
Below are the functions that we used in performing the internal validation criteria.
# Plotting internal validation criteria for various k
After clustering each hero observation based on its game impact, the researchers aggregated each hero observation per cluster by calculating the mean values of the features for each unique hero. Additionally, the most important features for each cluster were identified using the k_means.cluster_centers_ function.
The most important features for each cluster are the following. For Cluster 0 -- stuns, hero_healing, camps_stacked; For Cluster 1 -- obs_placed, sen_placed, hero_healing; For Cluster 2 -- gold_per_min, neutral_kills, xp_per_min.
Below is the complete list of features per cluster ordered by decreasing importance.
# Complete list of features per cluster ordered by decreasing importance
Looking at the list of features above and the distributions of each cluster plotted per feature below, the researchers can then define the themes of each cluster formed. The three (3) clusters were named as Utility (Cluster 0), Support (Cluster 1) and Core (Cluster 2).
Based on the plots below, majority of the distribution plots are very telling of each cluster's characteristics:
# Plotting distributions of features for each cluster
To further visualize the most important features of each cluster, the researchers plotted a radar plot with feature weights as values seen below. The radar plot is consistent with the list of features and distribution plots discussed above.
# Making radar plots
The Utility cluster contains a total of 117 heroes. The Support cluster contains a total of 60 heroes. While the Core cluster contains a total of 91 heroes.
A sample of the actual heroes included in each cluster is given below (limited to five (5) heroes only).
# Five random utility heroes
# Five random support heroes
# Five random core heroes
Now that we have clustered our heroes, we can do the second part of the analysis which is to recommend alternative picks for the said hero. By calculating the Euclidean distance between heroes on the same cluster/feature space, we can select the k most similar heroes to the banned hero.
For example, a team wanted to pick a support Crystal Maiden. However, the opposing team banned it. We then determine the most similar heroes with a sample query shown below.
# Alternatives to a support Crystal Maiden
As another example, suppose a team wanted to play Abaddon as a support --
# Alternatives to a support Abaddon
To show the purpose of clustering, suppose that the team wanted to play abaddon as a core --
# Alternatives to a core Abaddon
The resulting similar heroes are fair alternatives for the queried heroes during actual DotA 2 games.
The researchers were able to identify three (3) clusters based on a hero's game impact. The formed clusters and its general characteristics are the following:
Additionally, the researchers were also successful in providing a system that can determine an alternative to a banned hero. Based on the researchers' domain knowledge, the resulting similar heroes are fair alternatives to a queried hero during actual DotA 2 games.
For DotA 2 players, the Core cluster can be defined as Position 1 and 2; Utility as Position 3 and 4; and the Supports as Position 5.
For professional teams, apart from using this system to provide quick alternatives to a banned hero, it can also be used for theorycrafting alternative heroes to widen the pocket strategies of the team.
The researchers recommend the following to further optimize the system:
To complete the study, the researchers used the following resources as reference:
[1] Modes, G. (2019). Game modes. Retrieved 3 June 2019, from https://dota2.gamepedia.com/Game_modes#Captains_Mode
[2] Dota 2 Statistics. (n.d.). Retrieved 3 June 2019, from https://www.opendota.com/explorer
[3] DotA 2 Guide (n.d.). Retrieved 3 June 2019, from https://purgegamers.true.io/g/dota-2-guide/
[4] Use Faceting for Radar Chart (n.d.). Retrieved 23 July 2019, from https://python-graph-gallery.com/392-use-faceting-for-radar-chart/
In addition to the references used in the study, the researchers would like to acknowledge Prof. Christian Alis, PhD, Prof. Erika Legara, PhD and Prof. Eduardo David, Jr. for mentoring us throughout the course and imparting their knowledge in our journey to become a Data Scientist.